A Study of Parentheticals in Discourse Corpora - Implications for NLG Systems

نویسندگان

  • Eva Banik
  • Alan Lee
چکیده

This paper presents a corpus study of parenthetical constructions in two different corpora: the Penn Discourse Treebank (PDTB, (PDTBGroup, 2008)) and the RST Discourse Treebank (Carlson et al., 2001). The motivation for the study is to gain a better understanding of the rhetorical properties of parentheticals in order to enable a natural language generation system to produce parentheticals as part of a rhetorically well-formed output. We argue that there is a correlation between syntactic and rhetorical types of parentheticals and establish two main categories: ELABORATION/EXPANSION-type NP-modifier parentheticals and NON-ELABORATION/EXPANSION-type VPor S-modifier parentheticals. We show several strategies for extracting these from the two corpora and discuss how the seemingly contradictory results obtained can be reconciled in light of the rhetorical and syntactic properties of parentheticals as well as the decisions taken in the annotation guidelines.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Penn Discourse TreeBank as a Resource for Natural Language Generation

While many advances have been made in Natural Language Generation (NLG), the scope of the field has been somewhat restricted because of the lack of annotated corpora from which properties of texts can be automatically acquired and applied towards the development of generation systems. In this paper, we describe how the Penn Discourse TreeBank (PDTB) can serve as a valuable large scale annotated...

متن کامل

A Cross-linguistic and Cross-cultural Study of Epistemic Modality Markers in Linguistics Research Articles

Epistemic modality devices are believed to be one of the prominent characteristics of research articles as the commonly used genre among the academic community members. Considering the importance of such devices in producing and comprehending scientific discourse, this study aimed to cross–culturally and cross-linguistically investigate epistemic modality markers as an important subcategory...

متن کامل

Genre Analysis of ELT and Nursing Academic Written Discourse through Introduction

Since Swales’ (1981, 1990) CARS model work on the move structure of research articles, studies on genre analysis have been carried out amongst which works on different parts of research articles in various disciplines has gained a considerable literature. This study aims to investigate the rhetorical structure of the Introduction sections of articles in two fields of English Language Teaching (...

متن کامل

The multifunctionality of epistemic parentheticals in discourse : prosodic cues to the semantic-pragmatic boundary

The aim of this study is to identify the relation between the interpretation of epistemic parentheticals in discourse and their prosodic realisation . Data drawn from a corpus of British English speech suggests that epistemic parentheticals (comment clauses such as I think, J believe) convey a spectrum of meaning from propositional to interpersonal. They have long been categorised simply as sen...

متن کامل

What is in a text and what does it do: Qualitative Evaluations of an NLG system - the BT-Nurse - using content analysis and discourse analysis

Evaluations of NLG systems generally are quantiative, that is, based on corpus comparison statistics and/or results of experiments with people. Outcomes of such evaluations are important in demonstrating whether or not an NLG system is successful, but leave gaps in understanding why this is the case. Alternatively, qualitative evaluations carried out by experts provide knowledge on where a syst...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008